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ABSTRACT 

The edumetric adequacy of curriculum-based reading 
measures were examined for two basal reading programs. On the basis 
ol reading aloud performance, 91 elementary students were assigned 
seven instructional placement scores within each basal series. 
Students also were measured on standardized reading achievement 
tests. Generally, correlations. between instructional scoros within 
each series and performance on standardized tests were high and 
similar, providing evidence that the curriculum-based reading - 
measures are valid with respect to technically adequate standardized 
tests; however, validity was dependent on the placement criteria 
employed. Additional analysis revealed other important edumetric 
effects of using different placement criteria. The technical adequacy 
of curriculum-based reading measurement is discussed alonq with 
recommendations for developing instruct ionally useful measurement 
procedures. (Author) ' £ 
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Abstract 

The* edumetric adequacy' of curriculum-based reading measures was 

examined for two basal reading programs. On the basis of reading aloud 

* ** 

performance, 91 elementary students were assigned seVen instructional 
* placement scqres within each basal series. Students also were measured 
on standardized reading achievement tests. Generally, correlations bfe- 
tween instructional scores within each series and performance on stan- 
dardized tests were high and similar, providing evidence that the 
curriculum-based reading measures are valid with respect to technically 
adequate standardized tests; however, validity was dependent on the" 

placement criteria employed. Additional analyses revealed other impor- 

\ 

tant edumetric effects of using different placement criteria. The v tech- 
rtical adequacy of curriculum-based reading measurement Is discussed 
along with recommendations for developing instructional ly useful measure 
ment procedures. \ 



The Relationship Between Curriculum-based Mastery Measures 

and Standardized Achievement Tests in Reading 0 a • 

Since the passage of P.L. 94-142 in 1975, and with the increasing 
demand for accountability in the .schools, educators have been required * 
to support empirically their decisions that affect handicapped children. 
Because it provides the essential documentation and information for 
giaking decisions about pupils, measurement represents a critical com- 
ponent of the educational process. (Glaser & Nitko, 197J ; 'Yssel dyke, 1979) . 
Unfortunately, the measurement systems available for gathering assessment, 
information fail to provide all of the essential decision-making data- 

4 i 

while maintaining demonstrated technical adequacy (cf. Thorndike, 1971). 
Therefore, the 'data base on which educational decisions are made is typi- 
cally lfess than satisfactory. 

' Pre-post administration of norm-referenced achievement tests is the * 
most commonly employed measurement format (Anastasv, 1976; Glaser & 
Nitko, t lS71; Tyler, 1951). Yet, norm-referenced achievement tests suffer 
from the poor reliability characteristic of difference scorls (Stanley, 
1971), are unsuitable for Qngoing monitoring of the appropriateness of 
educational programs (Jenkins, Deno, & Mirkitv, 1979), and frequently lack 
content validity with respect to a student's curriculum (Armbruster, 
Stevens, & Rosenshine, 1977; Eaton & Lovitt, 1972; Jenkins & Pany, 1978). 
Following is a brief discussion of these limitations and of alternative 
measurement formats that might revolve these problems and improve the 
data base on which educational decisions are made, 
Limitations of Pre-Post Testing on Standardized Achievement Tests 

Pre-post"testing on standardized achievement tests is of limited use 



for the purpose of making educational programming decisions. Stanley 
,(1971) demonstr^ec^that pre and post testing on jthe same or similar 
tests leads to low reliability of an examinee's difference score. When 

L ' f 

the correlation is high between pre- and posttgsts, there 'is a great 
.overlap between the true scores o% the examinees; a high proportion of 
the obtained - true score difference is error. Additionally, when pre- 
post testing is employed, a determination of program effectiveness is 
made at the end of the treatment period. This supmative evaluation 
prevents the educator from employing the measurement data to improve 
the student's program throughout the ^eatment period. 

As an alternative, students can be. measured at regular, frequent 
points within the treatment period to formulate cngoing or formative 
decisions concerning program effectiveness. Unfortunately, achievement 
tests lack utility for such frequent checks on program effectiveness* 
because (a) they are too long to be administered regularly, and (b) 
with frequent use, children inadvertently learn tests that typically . 
have a limited number of alternate forms. 

Another problem of standardized achievement tests is that they 
frequently lack content validity with respect to* a student's curriculum 
Because the content of a basal reader may be unevenly 'represented in 
different tests, A a student's obtained score may be dependent on the 
choice of reading test rather than on actual student achievement. 
Several studies illustrate the potential lack of consonance between a 
test's content and a pupil's curriculum^ For example, Jenkins and 
Pany (1978) used several popular basal readers and standardized achieve 
ment tests to look at ihe relationship between a student's reading 
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curriculum and test performance. They computet! a hypothetical Grade' 

Equivalent Score (GES) for each test based on the, assumption that ^ 

student had learned the words taught in a particular reader. The 

GESs produced by this procedure indicated that achievement tests 
* 

are differentially sensitive to vocabulary taught in a particular" 

curriculum. For instance, a beginning second grader who had learned 

the vocabulary in the Macmillan series would score seriously below 

grade level on the*Word Analysis portion of the Metropolitan Achieve- 
+ 

ment Test, but at or above grade level on the other tests. This dif- t 
ferential* sensitivity of achievement tests undermines their utility in 
making educational decisions. 

Differential sensitivity of achievement tests is not confined to 
instruments that focus on word recognition skills. *Armbruster, Stevens, 

and Rosenshine (1977) compared the content of two reading achievement 

t 

tests* the Metropolitan Achievement Test (MAT>"and California Achieve- 
ment Test (CAT), with the content of the reading comprehension^'exercises 
in the Ecpnomy, Ginn 360, and Houghton-Mifflin reading series. They 
found that test items of the MAT and CAT failed to cover roughly 64% 
of Economy, 65% of Ginn, and 79% of Houghton-Mrffl i n reading compre- « 
hension exercises. Furthermore, tfyere were large' differences in the 
relative emphases between the reading. se^ .es and the achievement tests. 
Based on the percentage of different types of reading comprehension 
exercises and the percentage of items tapping these different compre- 
hension skills, the Ginn series correlated .10 with the MAT, while the 
Houghton-Mifflin series correlated .42 with the CAT. 
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* . The findings of JenMns and Pany and of Armbruster et al . were 
corroborated by research based on actual student performance, data 
(Eaton & Lovltt, 1972)°. Eaton -and Lovitt reported -largo inconsis- 

tencies in the scores obtained by the same learning disabled students 

* * % 

on different achievement tests, and, in the degree to which those 
achievement tests reflected the students 1 actual performance in 

particular reading series.^ On the basis of these studies, one can 

» * 

conclude that standardized achievement tests are unevenly sensitive 
Xo student progress and relative standing in varying curricula, and 
that'the use or different tests might result* in different educational 
decisions. 

Criterion-Referenced Assessment as an Alternative 

In contrast to standardized achievement tests that assess a 
student's relative standing on global skills, criterion-referenced 
tests are designed to measure attainment of specific skills within 
curricula in terms of designated performance standards, > Criterion- 
referenced tests are intended to be an integral part of an instruc- : 
tional system. In principle, the content validity of criterion-refer- 

* fc / 

enced tests is strong, since there is close correspondence among a air 

i? ' * • 

riculum, objectives, and test items. 

* „ *• * 

The need for criterion-referenced testing* has beer, documented in 
educational psychology and measurement literature (e.g., Gagne, 1965; 
G'.aser & Nitko, 1971; Popham, 1978). . Fi fteen.years ago* Gagne (1965) 
stated: 

Despite the existence of rather elaborate technology, it 
cannot be said with confidence that the assessment procedures 



customarily used in developing standardized tests are entirely <? 
adequate to meet current assessment needs. One important 
prgblem that does not appear to have been included in current 
techniques is a method for assessing human performance in 
$erms of the objectives^ instruction.* (p v 258) 

- Criterion-referenced assessment has received increased attention 
in the past decaxld (Ppphaw, 1£78).. Contributing to this grovtfng popu- 
larity have been two related developments. First, educational psycholo- 

gists have recognized that mastery of.subunits precedes mastefy of com- 

* * ' "* 

<plex tasks. In 1962, Gagne demonstrated the principle that most students 

can achieve a complex skill providing they have mastered prerequisite 

component skills/ A related development is that educational psychologists 

(e.g., Blopm, 1971) have applied the above principle in the development of 

mastery learning systems. In such systems, the curriculum is divided 

into components and objectives, those objectives are hierarchically 

arranged, and instruction is directed to the currenf instructional ob- 

jective unti"! the student demonstrates mastery of that objective on a 

criterion-referenced test. Criterion-referenced tests, therefore, are 

an essential component of a mastery learning system and t'h'eir use has ■ 

grown. concurrent with the increased popularity of mastery learning 

systems. . ^ 

Cri ter ion-re fer.e need assessment of successive unit's of § curriculum 

* ° c» 

appears to improve upon the content validity of educational assessment. 
However, in its typical format of pre-post testing around instructional * 
units, criterion-referenced assessment* shares a limitation of standardized 
achievement testing, namely, the poor reliability characteristic pf dif- 
ference scores. Additionally, the time schedule according to .which 

r 

educators administer cri terion^referenced tests usually is arbitrary 
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and most typically determined by the teacher* s informal judgment that 
the child haSflnastered the skill and is ready to demonstrate this 
mastery on a test. Consequently* th£ utility of this- measurement * 
format. in helping to. make decisions about student progress and program 
adjustments ultimately is dependent upon the accuracy ^f the teacher's 
unsystematic monitoring of student performance; that is, how accurately 
the teacher informally determi nesVjthat the child is ready tfo pass the 
criterion-referenced test. Therefore, criterion-referenced assessment 
1s unsatisfactor-y for .systematically monitoring pupil progress. 
Repeated, Curriculum- based Mastery Assessment as an Alternative 

* Repeated', curriculum-based mastery measurement incorporates the 
principles of criterion-referenced assessment. It is grounded in the^ 
-student's curriculum; it measures progress through a hierarchy of ob- 
jectives in that curriculum; and, it assesses students in relation to 
performance standards rather than in relation to other students^ 

However, repeated, curriculum-based mastery measurement departs 
from the typical cri terion-referenceti , mastery learning modfel of 
assessment in important ways. It borrows the operant research method- 
ology of repeated behavior sampling and time-series analysis. Employ- 
ing direct and frequent evaluation, a /teacher collects repeated, "short 
safnples of a student's behavior within the curriculum, over'a time 
period, and under different teaching strategies. At regular intervals, 
the educator als'Q miry measure the performance of mainstream peers on 
the same behavior. Then, the teacher applies' the met^ds of time- 
series analysis to the data in order to determine the effectiveness of 
specific program changes. 

Figure 1 illustrates repeated mastery assessment. The abscissa 



represents school days and tWP*5rdinate represents successive segments 

or objectives of the curriculum mastered; each data p<Apt represents 

v. - „ 

the number of curriculum segments mastered on a given day. The line 

of best fit through the data points depicts the rate- of student progress 
through the curriculum. The goal of repeated mastery assessment is 
to* increase the student's rate of mastery in the curriculum. The teacher 
r ' measures the student on a random sample of material from the current in- 
structional curriculum unit until mastery is achieved, at which point 
* (,a)* the student's graph registers that a curriculum unit has been mastered, 
^^(b) the studept's level of instruction progresses to the^|xt segment in 
the hierarchy, and (c) the pool of material on which the teaxher measures 
. the student also progresses to the next segment in the hierarchy. 



Insert Figure 1 about here 



In sev^al ways, direct and repeated mastery Measurement appears 
to strengthen .the data base 'on which educational decisions are made. It 
improves upon the content validity of norm- referenced tests by eval- 
uating student performance in relation to mainstream functioning on 
curriculum tasks. It enhances the reliability of measurement because 
.it is administered frequently and therefore is subject to less error 
and to richer analyst. Furthermore, given the typically short duration 
of tests and the availability of multiple test forms, it can tfe employed 
continuously to evaluate the appropriateness of educational programs. 

• Nevertheless^, -^mastery measures present two problems. First, they 
lack the demonstrated construct validity 'of^psychometrically adequate 
norm-referenced tests. Second, it remains unclear whether performance 
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on these measures provides Information on a students standing relative 

to a large, representative normative group. Expressed more concretely, 

the practitioner's concern is whether a student who manifests progress 

within mastery measurement (as depicted in Figure 1), can be expected 

* ~* 

also to show improved performance on traditionally accepted, psycho- 
metrically-sound standardized achievement tests. 

If simple mastery tests can be shown to demonstrate these charac- 
teristics, then direct and repeated measurement might represent a tech- 
nically adequate educational Measurement format that simultaneously pro- 
vides essential .decision-making information. It may, in fact,* represent 
the satisfactory data b$se with which educators can make^nd document 
their decisions. 

The purpose of the present investigation was (a) to assess the ex- 
tent to which simple, direct, progress measures. represent the same con- 
structs as longer, more global achievement tests, (b) to c determine whether 
performance on simple tests provides -information on students* standing 
relative to the populations on which norm-referenced achievement tests 
were standardized, and (c) to investigate whether progress depicted on 

r 

a mastery graph correlates with progress on psychometrical ly-sound 
achievement tests. Reading achievement was selected for the focus of 
investigation and the study 1 s purpose was translated into three research 
questions: 

• Does performance on simple curriculum-based mastery measures 
demonstrate concurrent validity with respect to performance 
on standardized reading achievement tests? - 
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Is the strength of association between simple curriculum-based 

mastery and standardized reading achievement tests dependent 

on the instructional criterion employed? 
• Is the strength of association between simple curriculum-based 

mastery and standardized reading achievement tests dependent on 

the specific reading material employed? 
rhese.- research questions deal with the concurrent validity of 
direct, curriculum-based mastery measures. Concurrent validity studies 
examine the usefulness of a mearsure in predicting performance on other 
variables. Typically, one is interested in assessing the suitability 
of substituting a short, simple test for a longer and/or more cumbersome 
criterion that has demonstrated technical adequacy (Messick, 1980). 
Criterion-relatedness is determined by correlational analysis where 
the strength of a correlation between two measures specifies the degree 
of predictive efficiency between the tests (Nunnally, 1967). Therefore, 
if criterion validity between simple measures and achievement tests is 
demonstrated and correlations are high, then predictive efficiency would 
be demonstrated between the tests. On that basis, one might assume 
that (a) simple tests demonstrate the val°}liity of and represent the 
same constructs as the 'longer, more global achievement tests, (b) the 
simple tests provide information on students' standings relative to 
the normative population on which the criterion tests were standardized, 
and (c) as a student manifests improvement on the simple measure, 
his/her standing relative to the normative group also may improve. 

A study of the concurrent validity of simple, direct curriculum- 
based mastery measures with respect to technically adequate norm-referenced 
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achievement tests should make an important empirical contribution to 
the field of measurement and special education. If strong predictive 
efficiency is demonstrated between these measures, then one might 
state that simple, direct, and repeated curriculum-based mastery meas- 
urement not only encompasses the technical strengths of norm-referenced 
measurement, but also, as illustrated above, may be more suitable for 
providing a data base on which educational decisions can be made. 
Additionally, if strong predictive efficiency is demonstrated between 
these measures, thenjfthe public's acceptance of direct, repeated • 
measurement might be ehhanced legitimately to a level at least com- 
parable to that of norm-referenced testing. This has implications for 
the usefulness of direct, repeated measurement, because the way in which 
assessment information is accepted may be an important factor in the 
extent to which, and the ways in which, data are employed and interpretsd 
(Ysseldyke, Algozzine, Regan, Potter, Richey, & Thurlow, 19JB0). 

It also appears important to determine whether the concurrent 
validity of curriculum-based mastery .measures is dependent on the cur- 
riculum employed. By definition, direct measurement occurs in the f 
specific curriculum employed within a school. Each curriculum, then, 
represents a different measure that needs individual validation. This , 
represents a difficult, jf not impossible, task. If it can be demon- 
strated that the specific curriculum employed does not affect the cri- 
terion validity of the measure or strength of association between measures, 
r then the need to validate each curriculum se^rately may be eliminated. 

Finally, because progress measurement entails determining mastery 
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on successive levels of material, it is critical to examine how 
different performance standards affect the criterion validity of 
the measure or the strength .of association between measures. An 
examination of the technical adequacy of differed performance stan- 
dards has potential implications for practitioners who employ all 
fori^ats of criterion-referenced measurement. Criterion-referenced 
measurement has been critigized repeatedly and severely because of 
the lack of empirical support for its performance standards (McLoughlin 
& Lewis, 1981; Thorndike* 1971; Wallace & J-arsen , 197S) . The present 
investigation may pro/ide some emp>fical support for one or jriore per- 
formance standards. 

In addition to the three primary research questibns, three otiier 
related questions were addressed*- These questions explored other tech- 
nical characteristics of repeated, mastery assessment. 

m 

The first two questions addressed the congruency of students' in- 
structional scores derived in direct measurement with their performance 
o. more widely accepted criterion measures. These two questions supple- 
mented the second research question, which examined the relationship 
between the concurrent validity of simple direct measures and the per- 
formance standards' employed. Because it is. possible, theoretically, 
for two measures to correlate well but agree poorly (Bradley, 1977), 
when selecting among performance standards, one migfit consider congru- 
ency along with concurrent validity. Specifically, the first related 
question was: Is the degree of congruency between instructional level 
scores calculated on curriculum materials and teacher judgments of in- 
structional level scores in the same material dependent on the performance 

> 
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standard employed? The second related question was: Is the extent of 
agreement between instructional level grade scores and the achievement 
test' grade scores dependent on the performance standard employed? 

The third related question addressed the potential sensitivity of 
the ^mastery measures to student achievement. Measures on which students 
manifest a relatively large range of behavior provide greater opportunity 
for students to register relatively small gains. A ^rge range of poten- 
tial behavior that results in heightened sensitivity to student growth 
is a desirable characteristic of repeated measurement. Therefore, the 
last related question asked: Is the range of behavior or the average 
progress per grade level dependent on the performance standard employed? 

Method 

Subjects 

Subjects were 91 randomly selected children, distributed across 
grades one through six, in one metropolitan public elementary school in 
the Midwest. All children were English speaking. Fifteen received 
special education resource service and J23 were enrolled in Elementary 
and Secondary Education Act Title I programs for children with reading 
problems • * 

* 

Measures 

Three types of measures were employed in the study: standardized 
achievement tests, teacher judgments, and graded reading passages. 

Standardized achievement tests . The Word Identification (WI) 
and Passage Comprehension (PC) tests of the Woodcock Reading Mastery 
Tests (Woodcock, 1973), Form A were employed. The WI test consists 
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of 150 words ranging in difficulty from preprimer level to words of 
above average difficulty for twelfth grade students. The easier items 
were selected from the vocabulary introduced in se^en basal reading 
programs from the first- preprimer through the third grade reader 
(Woodcock, 1973). The more difficult items were drawn primarily from 
the Thorndike-Lorge List (Thorndike & Lorge, 1 944) • The subject's task 
in the WI test is to name words. c 

The PC Test contains 85 items of a modified cloze procedure 
(Bormuth, 1969). The subject's task is to read silently a passage 
from wfrich a word has been deleted and to supply orally to the examiner 
an appropriate 'missing word. The passages range in difficulty from 
first grade to college level (Woodcock, 1973), 

Teacher judgments . For each student, teachers reported the bqok 
level in Ginn 720 (1976) from which the student read for instruction. 

Reading passages . Reading passages from the Ginn 720 and, the 
Scott-Foresman Unlimited (1976) series were employed in measurement. 
For 10 levels in Ginn and 9 levels in Scott-Foresman, two 100-word 
reading passages were selected as representative of the average read- 
ability level of the material from which the passages were drawn. 
Representative passages w$re employed because of Fitzgerald's (1980) 
finding of great variability in the readability of series of passages 
from the same books within seven reading series. Within repeated 
measurement, the effect of this variability on the reliability of a 
student's score is diminished because an average or median level across 
multiple observations is employed to describe a pupil's performance. 
In the current study, it was not feasiDle to measure students repeatedly 
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So, ip an attempt 'to Improve the reliability of students 1 scores 
to represent more accurately the technical adequacy of repeated measure- 
ment, passages were selected as representative of>the average readability 
of the material from which they were drawn. (See Fuchs & Deno% 1981, for 
a xlescri ption of the passage selection procedure.) Table 1 displays 
publishers' level numbers and grade levels, and readabi li ty "information 

for each selected passage of both series. 
* ______________ 

Insert Table 1 about here 

Procedure 

Prior to testing, the classroom teachers completed and returned to 
the investigator a form on which they indicated the students* actual 
Ginn placements ! Also, five examiners were identified and trained in the 
administration and scoring of all measures. 

During a 45 to 60 minute session, each subject was tested individually 
on all measures, by one randomly determined examiner in one of four quiet 
and isolated locations within the school. The WI and PC tests were ad- 
ministered according to the Manual (Woodcock, 1973). For both series, 
the readi ng passages were administered in a random order employing the 
following procedure: The examiner found the appropriate passage in a 
teacher notebook containing all passages and the^ corresponding passage 
in a student notebook containing all passages. As the examiner exposed 
the passage to the student, the examiner said, "I'd like you to read 
aloud' some words to me .as quickly as you can. If you 'don't know a word, 
sjcip it. Try your hardest. Remember to read Very quickly. I'll tell 
you when to stop. Any questions?" .The examiner said "Begin" as he/she 
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started a stopwatch; as the student read, the examiner wrote with' a 
transparency pen on the acetate covering the teacher copy. Making 
sure that his/her writing was hidden from the subject, the examiner 
crossed out omissions, substitutions, insertions, and mispronunciations. 
I,f the student completed a .passage in less than 60 seconds, the exam- 
iner wrote the number -of seconds in which the student read the passage. 
At 60 seconds, the examiner told the student to stop. With each pas- 
sage, the examiner repeated this procedure except for directions, where v 
the examiner simply said, "Any questions? Ready to read?" After all 

. testing was completed for a student, the examiner scored each passage 

* < 

by counting words correct a'hd words incorrect. Then on a recording 
form, the examiner wrote these scores in the appropriate spaces and indi- 
cated the number of seconds for those passages that the student completed 
in less than 60 seconds. 

For each of the 19 passages, each student received a words correct 
per minute score, an errors per minute score, and a percent correct per 
minute score. On the basis of these scores, a student was assigned, 
within each series, seven different instructional level scores based on 
the following criteria of instructional level: 

Instructional Criterion 1 : For preprimer (PP) through grade 3 
books, 30-49 words per minute (wpm) with 7 or fewer errors per 
minute (epm); for grade 4 through grade 6 books, 50+ wpm with 
7 or fewer epm (Starling Starlin, 1974). 

Instructional Criterion 2 : 70+wpm with 10 or fewer epnr(Starli n, 
T979T: 

Instructional Criterion 3 : 100+ wpm with 0-2 epm (Haring, Liberty, 
& White, undated) . ~ 

Instructional Criterion 4 : 95% accuracy (Betts, 1946; Harris, 
1961; Powell , 1971). 

20 
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Instructional Criterion 5 : 70+ wpm and 95% accuracy. . 

Instructional Criterion 6 : For PP through grade 2 books, 50+ 
wpm and 95% accuracy; for grades 3 through 6, 70+ wpm and 95% 
accuracy. 

Instructional Criterion 7 : For PP through grade 2 books, 50+ 
wpm and 85% accuracy (Powell , 1971); for grades 3fthrough grade 
6 books, 70+ wpm with 95% accuracy. 

For each criterion, within each series, an instructional level 
score was assigned to each student by identifying the highest level at 
which the criterion was met before an unsatisfactory performance was 
demonstrated at two consecutive levels. 

Resul ts 

Do Simple Curriculum-based Mastery Measures Predict Performance on 
Standardized Reading Ac hievement Tests of Word Identification and 
Comprehension ? 

To examine this question, a Pearson Product-Moment correlation 
matrix was generated, including the seven instructional level scores 
for bpth series and the PC and WI raw scores. Table 2 displays these 
correlations. Inspection of this table reveals that correlations were 
moderate to high and were statistically significant (j) < .001). Tha 
correlations ranged from .57 to .95; 23 of the 28 correlations were 
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greater than or equal to .80. 

Insert Table 2 about here 



Is the Strength of Association Between i Simple Curriculum-based Mastery 
Measure and^a Standardized Reading Achievement Test Dependent on the 
Instructional Criterion Employed? 

"To investigate\this question, the correlations in Table 2 were 

examined across series an4 within' instructional criteria. Averaged 
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within instructional criteria, at least one criterion of instruction 
appeared to affect the strength of association between the sfmple 
curriculum-based mastery measures and the standardized reading, 
.achievement tests. For Criterion 3, the average of the four corre- 
lations was ,62, lower than any other average correlation by ,23,. 
The average correlation was highest for Criterion 1 (.93), The 
correlations produced by the remaining criteria were similar* ranging 
from an average .85 for Criteria 4 and 5^ to an average ,87 for Criteria 
2 and 7, 

Unfortunately,, there is no appropriate statistical test for 
determining the difference between two dependent correlations when 
one of the sets of scores is not identical in both correlations. This 
makes it impossible to test Statistically the difference between many 
of the correlations calculated on the same sample in this study. 
Additionally, where questions concern differences between two dependent 
correlations when one of the sets of scores is identical in both cor- 
relations (r xz with Fy 2 )» the available test limits anyjnference to 
only a subpopula tion of all possible samples for which X and Y have 
exactly the same set of values as those in the observed sample (Walker 
& lev, 1969). Consequently , vthe utility of such a test is limited, 
and given the dependency in the data and the large number of 'analyses 
run, it appears appropriate to forego these additional statistical 
analyses (Terwilliger , 1980), Therefore, the differences in the data 
are discussed without the benefit of statistical probability. 
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Is the Strength of Association Between a Simple Curriculum-based Mastery 
Measure and a Standardized Reading Achievement Test uependent on the 
Specific Reading Material Employed ? 

• For each instructional criterion, the correlations in -Table 2 were . 
examined within series. As is evident in the table, all correlations „ 
within series were high and statistically significant (£ < .001). For 
Series A the mean correlation (.87) was somewhat higher tnan for Series * 
B (.82). Furthermore > within each instructional criterion and within 
achievement test (PC vs Wl), the correlation for Series A was* consistently 
higher than for Series B. However, the difference between these mean 
correlations (.87 - .&2 = .05) was small and probably does not represent 
a reliable difference. 
Additional Analyses 

In addition to the research questions explored above,- three analyses 
relating to other technical characteristics of repeated mastery measure- 
merit were completed on the data collected in this study. 

Cdngruegcy between instructional level scores calculated on curric - 
ulum materials and teacher judgments of instructional level scores in the> 
same material and relationship of congruency to the performance standard 
empl oyed . The degree of congruency between instructional scores and , 
teacher placements was examined by calculating^, for each instructional 
criterion, the percentages students whose instructional level scores 
placed them the same as, L jow, or above the teacher placements. These 
percentages are displayed in Table* 3. Inspection of this table reveals 
that Instructional Criteria 4, 5, 6, and 7 were similar in congruency, 
with an average 19,5% of students placed below, 64.5% of students 

placed the same, and an average of 15. 8%. of students placed above the 
* * 
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teacher placement. The distribution for Criterion 2 was similar to 
Criteria 4 through, 7; however, it placed a ^greater percentage of 
students (29.0%) above the teacher placement. Criterion! placed 
a gfeat, percentage of students (50.0%) above while Criterion 3 
placecF the greatest percentage of students (58.0%) below the teacher 
placement. 

f 

Insert Table 3 about here 

Correlated t tests corroborated this pattern cf congruency for 
the different instructional^ criteria. The difference between the in- 
structional scores and the teacher placement was statistically signi- 
ficant for Criterion 1, Jt(89) = 8.42, £ = .000 (mean difference = 
1.87), and for Criterion 2; ^(89) = 2.29, £ = .000 (mean difference = 
.54). yfor Criterion 3, the difference also was statistically signifi- 
cant _t(89/= -7 .72, £ = .000. For this criterion, however, the teacher 
placements were above the instructional scores (mean difference = -2.32). 
For Criteria 4 through there was no statistically significant dif- 
ference. 

Agreement between the instructional gra<ie scores and the achieve ment 
test grade scores and relationship to the criterion of instructional 
level employed . The degree of congruency between instructional grade 
scores and achievement test grade scores was examined by calculating, , 
for each instructional criterion, the percentages of students whose 
instructional grade scores^placep them below, at the same level, and - 
above the PC and WI grade scores. Therefore, four combinations of con- 
gruency percentages were "cal culated: Series A instructional grade 
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scores with PC and with WI grade scores, and Series B instructional 
grade scores with PC and with WI grade scores. The average per- 
centage? across the four ^combi nations are presented in Table 4. 

Insert Table 4 about here 

The extent of congruency was similar again for Criteria 4^ 5« 6, 
and«7, with averages across the four criteria being 51.39% of students 
placed the same, 10.18% placed above, and 38.43% placed^ow the achieve 
ment grade scores. Criterion 2 presented a similar pattern with approxi- 
mately equal percentages placed below and above the achievement scores. 
Criterion 3 placed a great percentage of students (60.25%) below,, while 
'Criterion 1 placed a great percentage of students (43.25%) above, 
v Again', correlated t tests corroborated this pattern of congruency 
for the instructional criteria. For Criteria 1 and 3, the difference 
between the instructional grade scores and the achievement test grade 
scores was always statistically significant (_t(91) * 3.35, £ < .001 for 
Criterion 1, and J:(91) * 5.33, £ = .000 for Criterion 3). Criterion 1 
placed students above by an average of .55 levels; Criterion 3 placed 
students below by an average of 1.29 levels. For Criterion 2, the 
average difference was the smallest (.11 levels'). 

Average Increase per grade level as a function of the Instructional 
criterion empl oyed . Within series and for each instructional criterion, 
the mean instructional level score for each grade level was graphed 
(see Figures 2 and 3). Next, by series and by instructional criteria, 
the average increase per grade level was calculated. Finally, across 
series, these means were averaged (see Table 5). 

o *** 
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------------ ------------ 

Insert Table 5 and Figures 2 and 3 ^about here 

■ / ■ . 

/ Visual inspection of Figures 2 and -3 and' analysis of Table 5 reveals 
that, across series, the average increase pergrafle level was similar 

* 4 * 

for all criteria but the third, where the average increase was relatively 
small . - 

3 Discussion • 

In analyzing the success of the criterion-based measures in f pre- 
dicting achievement test performance, the statistically significant, 
moderate to high correlations provided evidence for the concurrent valid- 
ity of curriculum-based mastery reading measures with word Recognition/ 
and comprehension achievement tests. In a comparison between correla- 
tions with Passage Comprehension and with Word Identification scores, 4 
correlations within instructional criteria and series were always simi- 
lar, even though criteria did not require students to demonstrate any 
comprehension of the material. This may be explained by tire fact that 
the Woodcock Passage Comprehension Test uses a cloze procedure that 
asks students to -read words rather than to answer comprehension questions. 
Nevertheless, the progress measures do appear to predict performance *on 
both valid and reliable standardized tests of reading comprehension and 
words . 

Seven instructional criteria based on oral reading in context were 
employed to explore the dependence t of the above association on" the 

# 

performance standard employed. Criteria 1 through 3 were selected because 
they are advocated by Precision Teachers (Alper, Nowlin, Lemoine, Perire, 
& Bettencoutft, 1973; Haughton, 1972; Starlin, 1979; Starlin & Starlin, 
1974). Criterion 4 was employed because it is the traditionally accepted 
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informal reading Inventory*, instructional criterion of word recognition 
accuracy (Betts, 1946; Harris, 1961; Powell, 1971). Criteria 5 and 
6 represented combinations of the rate and percentage-accuracy criteria 
found in the first three criteria. In Criterion 7, a lower standard of 
85% accuracy for students in books preprimer through grade 2 was intro- 
duced because Powell (1971) demonstrated that preprimer through second 
grade readers maintained 70% comprehension while their word recognition 
accuracy was at 85% or better. 

All correlations between the instructional Scores and the Passage 
Comprehension and Word Identification scores were moderate to high and 
statistically significant regardless of the instructional criterion 
employed. Yet, a careful comparison among ths average Correlations 
associated with each instructional criterion revealed that et least one 
criterion of instructional level was a differentially poor predictor/ 
Criterion-3, the most stringent criterion , placed many students at low 
reading levels, failing to discriminate effectively among readers^with 
different skills, resulting in'lower correlations 
and failing to predict efficiently performarfce on 
Therefore, the strength of association between curriculum-based mastery 
measures and standardized reading achievement tests does appear to be 
affected by the instructional criterion employed. Results of this study 
suggest that as practitioners select an ins Actional criterion to em- 
ploy within direct and repeated curriculum-based measurement, they 
might opt for rates between 30 and 70, ai^^r percentages between 85 
and 95. \^ m 

In contradistinction to these results^ Beck (1980), in the Sacajawea 
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Project, suggested^rates as high as 150 wpm. The discrepancy between 
Beck's recommendations and those based on this study may be explained 
by a distinction between proficiency and fluency (Brent, Arnold, & 
DuRoss, 1978), "Proficiency" is the level of performance standard 
that results in long-term maintenance from intense, practice, "Fluency" 
is that level of performance that represents competency on unfamiliar 
material. In this study, fluency was assessed. Children read primarily 
unfamiliar material; from that performance, judgments of instructional 
placements were determined. However, Beck's interest is proficiency, 
or the level at which, after intense practice, a student can progress 
to new material. It may be th&t, with familiar material , the more 
stringent mastery criterion of 150 wpm might result in higher correlations 
and in better congruency. 

The third major question addressed in the present study asked whether 
the association between curriculum-based measures and standardized achieve- 
ment tests was dependent on the reading curriculum employed. To explore 
/this question, two basal reading series were selected that had different 
program emphases. One of the series employed, Ginn 720, is representative 
of many basal programs in its eclectic approach to reading instruction. 
The other program, Scott-Foresman Unlimited series, places a greater 
emphasis* on comprehension and study skills. It was reasoned that if 
the strength of association demonstrated by two different types of 
reading curricula were similar, then one might generalize that the 
strength of association would be similar across other curricula as well. 

Across criteria and within the two series, all correlations were 
% statistically significant, high, and similar. Neither the criterion 
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validity of these measures, then, nor- the strength of their association 
with star^ferdized reading achievement tests appears to be dependent on 
the reading material employed. To the extent, that the curricula used 
are representative'^ basal reading series, then curriculum-based mastery' 
measurement from different reading curricu demonstrates strong predic- 
tive efficiency and concurrent validity with respect to achievement tests. 
. Apparently, the practitioner might assume that the selection of reading 
series does not affect the validity of curriculum-based measurement and 
that curriculum-based measures can be. used across different basal reading 
series. 

In addition to the major research questions of this study, the issue 
of congruency between curriculum-based measurement and both teacher judg- 
ments and achievement 'tests was explored. These two analyses were con- 
ducted because it 1s possible, theoretically, for two measures to cor- 
relate well but agree poorly (Bradley, 1977). In selecting among instruc- 
tional criteria, one might well consider congruency along with the 
strength if association. 

In the present study, the results revealed -that, first, the degree 
of congruency between teacher placements and the curriculum-based place- 
ments varied with the Instructional criteria used. Second, the extent 
.of agreement between curriculum-based mastery measures and achievement 
test grade scores was different when different instructional criteria 
were employed. The degree of criterion validity of curriculum-based 
measures appeared to be dependent on the criteria employed in the 
measurement. Additionally, the results empirically demonstrated Bradley's 
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Contention that 1t is possible for two measures to correlate well but 
agree* poorly. For example, Criterion 1 produced the* highest average 
correlation but did not agree well with either of the criterion measures, 

• As the practitioner selects a criterion of instructional level to 
implement within repeated curriculum-b^sed master^ measurement, he/she 
might opt for one of the other Instructional criteria that simultaneously 
produced good correlations and agreed well with the criterion measures. 
Several .legitimate standards might be selected for determining an accept- 
able mastery criterion. Assuming as a' dual standard for good agreement 
at least 50% equivalent placements and an educationally unimportant dif- 
ference of -.50 level or less between mastery level (grade) scores and 
other criteria, Instructional Criteria 2, 4, 6, and 7 appear acceptable. 
Criterion^ 2 *1s-*70+ wpm with 10 or fewer errors across, grade levels. 
Criterion 4 is 95% accuracy (in a one-minute sample). Criteria 6 and* 7 
employ different oral reading rates for primary (50 wpm) and intermediate 
(70 wpm) readers, with 95%/95% or 85%/95% accuracy criteria, respectively. 
Any one of these four criteria appears to be a good choice for practition- 
ers. Additionally, for ongoing use of a mastery criterion where one 
is Interested in proficiency rather than fluency, one might consider 
Beck's recommendation of 150 wpm. However, the external validity of this 
criterion is unclear. 

When selecting between percentage and percentage-rate criteria,, 
there are several-instructional considerations. Precision Teaching ex- 
perts '(Cohen, 1975; Lindsley, 1971; Haughton, 1969) argue that rate (a) 
1s more sensitive to behavioral change than is percentage, (b) provides 
a basis for comparing performance among curricula, (c) communicates speed 
and accuracy rather than just accuracy, and (d) Imposes no performance 
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ceiling as does percentage. Further, although percentage implies a re- 
ciprocal relationship between correct and incorrect responses, this 1s 
not necessarily the case. It is possible for a student to score 90% 
on two days and for that student's performance to bp qualitatively dif- 
ferent. The same percentage score may be based on differing numbers of 
errors and Words correct from day to day. Therefore, the combination of 
rate and percentage communicates more information than percentage alone; 
for instructional planning purposes, the practitioner may prefer one of 
the percentage rate combination performance standards. At the Same time, 
it should be remembered that in the present study the Reading sample was 
time limited, so even the 95% criterion was in some sense a ratio and 
accuracy criterion. 

Another related issue addressed in the present study was the poten- 
tial sensitivity of different measures to student achievement progress. 
Measures on which students manifest a relatively large range of behavior 
provide greater opportunity for students to register relatively small 
gains. A large range of potential behavior that results in heightened 
sensitivity to student growth is a desirable characteristic of repeated 
measurement. 

Criterion 3 appeared to produce a differentially low rate of average 
progress both within and across series, suggesting that the average pro- 
gress per grade level might be dependent on the instructional criterion 
employed. Yet, across the six remaining criteria, there appeared to be 

no effect. This leads one to .infer that only the third, most stringent 

> 

criterion, which also resulted in relatively poor association and poor 
agreement with criterion measures", differentially affected the average 
progress per grade. 
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Conclusions 

The research findings in this investigation support several 
generalizations. They ai*e summarized as follows: 

t The validity of simple curriculum-based mastery measurement is 
strong. Performance on curriculum-based mastery reading 
measures is highly related to performance on a valid and 
reliable standardized reading achievement test. 

t The validity of simple curriculum-based mastery measures 
is dependent on the instructional criterion' employed. 

t The validity of simple curriculum-based reading measures 
appears to be independent of the specific reading series 
employed. 

t The degree of congruency between instructional level scores 
calculated by simple curriculum-based measures and teacher 
placements is dependent on the instructional criterion employed. 

t The extent of agreement between the instructional grade scores 
calculated by simple curriculum-based measures and achieve- 
ment test grade scores is dependent on the instructional 
criterion employed. 

t In mastery measurement, the range of' behavior or average in- 
crease per grade level appears to be dependent on the instruc- 
tional criterion employed. 

• Of the instructional criteria employed in this investigation, 
those employing (a) 70 wpm with 88% accuracy, (b) 95% accuracy, 
and (c) different oral reading rates for primary (50 wpm) and 
intermediate (70 wpm) readers with 9^%/95% and 85%/95% accuracy, 
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• respectively, are good choices. They render scores that 
demonstrate strong criterion validity with respect to 
achievement tests, and they produce scores that agree well 
with teacher placements and with achievement test scores. 
As discussed in the introduction to this paper, simple curriculum- 
based mastery measures validated in this Investigation when employed 
repeatedly and analyzed with time-series methodology, yield more useful 
data than traditional testing formats for making educational decisions. 
First, direct measures evaluate student performance in relation to 
mainstream functioning. Second, they can be administered frequently 
and therefore reduce error, enable richer analyses, and allow programs 
to be evaluated continuously. Finally, because they are direct, their 
content validity fs high, providing useful data concerning student prog- 
ress in the curriculum where the student functions. 

Norm-referenced achievement tests, however, do present three 
distinct advantages^ over direct and repeated measurement. They have 
demonstrated construct validity. .They provide information on students 1 
standings relative to large, well -represented groups of children, and 

they are better accepted among professionals and parents. 

j. 

Results from this study indicate that simple curriculum-based 
measures represent the same constructs as the longer, more global tests 
and that they can be used to provide information on students 1 standings 
not only relative to mainstream peers but also relative to the same 
large representative population on which the Woodcock Reading Mastery 
Te'sts were ribrmed. ,It also appears that improved* performance on direct 
and repeated measures might indicate improved standing relative to that 
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• large representative population. Therefore, it appears that repeated 
"curriculum-based mastery measurement may fill a void in educational 
measurement. 
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Table 1 

Level Numbers, Grade Levels, and Readability 
, Information of Fassages from Two Reading Series 



Series 

Level 

Number 


Grade 
Levels 


I Readability 
Score Across 
Passage 


N 


SD 


X Readability 
Scores of Two 
Selected Passages 


Series A 












3-4 


PP-P 


• _ - _ 

2.02 


8 


.098 


2.01 • 


5 • , 


. 1-1 


2.21 


5 


.117 


2.20 


6 • 


2-1 


2.43 


6 


.196 


2.43 


7 • 


2-2 


3.17 


13 


.536 


3.10 


8 


3-1 


3.60 


10 


.468 


3.66 


9 


3-2 


4.11 


6 


.142 


4.05 


10 


4 


5.00 


11 


.476 


5.00" 


11 


5 


5.38 


10 


.534 


. 5.36 


12 


6 


5.81 


14' 


' '.392 


5.' 75 


13 


» 7 


6.00 


13 


.593 


6.03 


Series B 












2-3 


PP-P 


2.57 


9 


.439 


2.57 


4 


1 


2.73 


5 


.156 


2.77 % 


5-6 


2-1 


2.87 


10 


_ .282 


2.95 


7-8 


2-2 


' 3.29 


7 


.293 


3.30 


9-10 


3-1 


3.64 


9 


.754 


3.59 


* 11-12 


3-2 


4.02 


>3 


..520 


3.94 


13-15 


4 


4.89 


5 


.252 


4.82 


16-18 


5 


5.64 


11 


.525 


5} 70 


19-21 


6 


6.04 


13 


.144 


6 i03 



a Ser1es A is Ginn 720 and Series B is Scott-Foresman Unlimited. 



Number of passages employed,* 
Standard deviation across passages, ■ 
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Table 2 

Correlations Between Instructional Scores on Simple 
Measures and Raw Scores on the Passage Comprehension (PC) and 
Word Identification (WI) Achievement Tests (N=91) 



Instructional 


. b 




Correlation 3 


Criterion 


Series 


with PC 


with UT * 


1 
i 


A 


.93 




.95 


l 


B 


' .92 




.92 


2 


A 


* 

•9 

.92 




.89. 


2 


B 


.87 




.82 


3 


A 


.65 




.62 


3 


B 


.63 




'•.57 


4. 


A 


.88 




^88 


*4 


B 


.82 




.81 


5 


A 


.90 




.88 


5 


B 


.83 




.78 


6 


A 


.91 




.89 


6 


B 


.85 




.80 


' 7 


A 


.93 




.91 


7 


:< 8 


-89 




.86 


a All correlations 


were statistically significant (£ 


< .001). 


b Series A is Ginn 


720 and Series 


B is Scott- 


■Foresman 


Unlimited. 
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Table 3 



Percentages of Students 


Placed Below, 


the Same, and Above 


Teacher 


Placements by Each Instructional Criterion (N=89) a 




o 

Instructional uriterion 


Placement by Curriculum-based Measures 
Compared to Teacher Placement 




Below 


Same 


Above 


• - \ 


3 


47 


50 


2 


18 


53 


29 


3 


58 


39 


3 


'4 


21 


61 


18 


5 


23 


63 


15 


6 


x 19 


65 


14 


7 


15 


69 


16 



No placement was. reported for two students. 
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Table 4 

For Each Instructional Criterion, Percentages of Students Placed 
Below, the Same, and Above Achievement Test Scores (N=91) a 



instructional Cri terton 

* * * 


Curriculum-based Grade Scores Compared to* 
Achievement Test Scores 


Below 


Same 


Above 




11.25 


44.75 


43.25 


2 


26.50 


51.50 


2i!io^ 


3 


60.25 


38.00 


1.00 


4 


-39.25 


46.50 - 


13.50 




42.50 


49.00 


7.75 


. '6 

» 


40.00 


51.75 


7.50 


' 7 . 


32.50 


58.00 


, 8.75 


Percentages are across reading series and across achievement tests 
(WI and PC). 
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Table 5 

Average Increase Per Grade Level Calculated on 
Curriculum- based Measures 



. Avyage Increase Per Grade Level 



War; 

) , Series I Series Average 

Instructional^ Criterion A ( B Across Series 

1 1.90 1.76 1.83 

.2 , 1.90 1.72 1.81 

3 .96 ■ , .94 .95 

• ' 4 1 .76 1 .74 1 .75 

5 ' 1 .76 1 .68 1 .72 

6 «1.88 * 1.64 1 1.76 

7 1.88 1.68 1.78 
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Figure 2. Within the Ginn 720 series for each instructional criterion, 
the average instructional score per grade level. 
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Figure 3. Within the Scott-Foresman Unl iifii ted series, for each instructional 
criterion, the average Instructional score per grade level. 
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